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It is widely held that the Random Energy Model (REM) describes the freezing transition of a variety of types of 
heteropolymers. We demonstrate that the hallmark property of REM, statistical independence of the energies of 
states over disorder, is violated in different ways for models commonly employed in heteropolymer freezing studies. 
The implications for proteins are also discussed. 



I. INTRODUCTION 

Heteropolymer freezing is widely recognized as a model 
for protein folding. By heteropolymer freezing, we refer 
to the transition from a phase in which many conforma- 
tions (0(e )) dominate equilibrium to one in which only 
one or very few (0(1)) are thermodynamically relevant. 
Remarkable progress in the understanding of heteropoly- 
mer freezing has been achieved in recent years mainly due 
to the concepts borrowed from the statistical mechan- 
ics of spin glasses, such as the so-called Random Energy 
Model (REM) first suggested by Derrida (!]]. 

Noting the properties of REM — a rugged free energy 
landscape and the statistical independence of states - 
Bryngelson and Wolynes first proposed the use of REM 
for protein folding Other approaches, which em- 

ployed the sophisticated machinery of replica mean field 
theory and began with an, although simplified, micro- 
scopic model of random heteropolymer chains, also led 
to REM- like conclusions || . 

However simple technically, REM and related ideas 
have proven to be extremely fruitful in protein folding 
studies [||-[7j. It turned out so successful, that at present, 
it is often considered not as just a model, which may 
or may not be applicable under different circumstances, 
but as a simple and adequate universal language. While 
REM seems like technical theoretical jargon, it is closely 
related to the set of physical insights which is often used 
without mentioning (and perhaps without considering) 
any theoretical concepts at all (eg. Re f. J9| ). 

The validity of REM was proven P,|10{ for maximally 
compact heteropolymers with a Gaussian distribution of 
monomer interaction energies. On the other hand, there 
are many observations that are hardly understandable 
using REM Thus, the goal of the present work is 

to scrutinize the real strength and validity of REM in 
the context of heteropolymer freezing and protein folding 
studies. We will show that REM is indeed often appli- 
cable, but it is far from being applicable universally; its 
applicability is a highly non-trivial property, and it fails 
or is at least questionable in many cases. 



II. QUALITATIVE CONSIDERATIONS 

Heteropolymer freezing is said to be due to the frus- 
trated interplay between three fundamental polymeric 
elements: sequence, set of conformations available for 
the given chain, and the nature of interaction between 
monomer species. More specifically, the freezing of a 
polymer in a certain conformation is governed by the 
competition of two conflicting factors. First, how well are 
the interactions adjusted in this conformation? Clearly, 
this depends on the character of interaction between 
monomers. A broad distribution of the types of monomer 
species favors energetically strong preferences between 
conformations. Second, how many conformations exist 
in the vicinity of a given conformation? This defines the 
entropy and depends entirely on the internal geometry 
of the space of available conformations (chain flexibility, 
topology, etc.). 

REM specifies this picture of freezing by the asser- 
tion that for each sequence, the polymer energy E a of 
conformation a is taken randomly from probability dis- 
tribution over disorder P(E), which is the same for all 
q; furthermore, they are taken independently, such that 
joint probability is directly expressed in terms of P{E): 
P{E a ,Ef}) = P(E a )P(Ep). Obviously, this cannot be 
valid exactly. Indeed, for conformation (3 which is only a 
minor conformational rearrangement of a, many contri- 
butions to the respective energies are identical and thus 
the energies are clearly correlated. REM validity requires 
that such pairs of conformations are rare. 

The simplest quantitative measure of statistical depen- 
dence between the energies of two given conformations a 
and P over the set of sequences can be obtained by taking 
correlation (E a Ep) — (E a ) (Ep) — (8E a 5Ep), where (. . .) 
denotes averaging over sequences. REM invalidity can 
be demonstrated by the non- vanishing of this correlator. 
We consider the Hamiltonian 
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where si G {1, . . . , q} is the species of monomer number 
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/, q is the number of species, Bij is the matrix of species- 
species interactions, r/ is the position of monomer /, and 
S (r — r') is the function that is concentrated on nearest 
neighboring space points. We implicitly assume here, 
that position vectors 17 are such that the conditions of 
chain connectivity (unitary spatial distance between se- 
quential monomers), excluded volume and constant den- 
sity (on the lattice, for instance, every site is occupied by 
one and only one monomer) are all met. This is the most 
general model for the case when heterogeneity comes ex- 
clusively from pairwise, short range interactions B. For 
the Hamiltonian (111), we find 



(SE a 6E p ) = (SB 2 ) Q a 



,0 + Nz 2 ^ PiSBijPjSBjkPk (2) 



where pi is the probability of finding a monomer 
of species i, SB^ = B^ - J2klPk B kiPi, (5B 2 ) = 
yZjj Pi (3Bij) 2 pj is the variance of the elements of the in- 



teraction matrix, and Q a p = ^( r j — r j)$( r i ~ r j) 

is the conventionally defined overlap between confor- 
mations |5|,[n|. We have taken into account here the 
condition that the polymer is maximally compact, so 
that each monomer has z space neighbors, and thus 

E/^a'^Oj ~ r j) S ( r j ~ t k) = Nz2 - Thus > we see 
that there are two contributions to the correlator, one 

dependent on conformations and the other independent. 

Eq. (||) formally shows how aspects of interactions 
{Bij), conformation space (Q Q| a), and sequences (pi) en- 
ter into the nature of statistical dependence. In the fol- 
lowing section, we use eq. (||) to demonstrate deviations 
from REM induced by manipulating each of these three 
elements. 



III. DEVIATIONS FROM REM 
A. Conformations 

The first term of the energy correlator eq. (0) is pro- 
portional to the number of bonds that conformations a 
and (3 have in common, Q a p] clearly, identical bonds give 
identical contributions to energies of both conformations, 
thus yielding a dependence. However, if large Q a p are 
rare, then REM remains a good approximation. To ex- 
amine this for simple computational models typically em- 
ployed, we have computed P(Q) = Yl a 8 — Q) for 
a variety of conformational spaces. 

One type of space consists of maximally compact con- 
formations (27-mers and 36-mers in 3 x 3 x 3 and 3x3x4 
respectively) which are typically used in freezing sim- 
ulations |^,|l^,|l2|; maximally compact conformations of 
relatively short chains favor REM validity as small rear- 
rangements are not possible and P(Q) will have virtually 
no contributions in the large Q range. In comparison, we 



considered maximally compact crumpled 64-mer confor- 
mations on the 4x4x4 lattice. In crumpled conforma- 
tions, neighbors along the chain are likely to be neighbors 
in space 13 1. In this sense, crumpled conformations per- 



haps crudely model the effect of secondary structure. In 
our lattice model, the 4x4x4 cube can be broken down 
into eight 2x2x2 subcubes, and crumpled conforma- 
tions fill every site in a given subcube before entering a 
new one. 
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FIG. 1. AS(Q) ~ ln[P{Q)/P(Q = Q max )] for com- 
pact 27-mers, compact 36-mers, and compact & crum- 
pled 64-mers. The discrete region boundary varies greatly: 
Qd/Qmax ~ 0.6,0.4,0.7 for 27-mers, 36-mers, and crumpled 
64-mers respectively. 
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FIG. 2. P(<2/Qmax) for crumpled, compact 64-mers. We 
compare the results of conformations taken at random with 
that obtained by comparing the degenerate ground state con- 
formations of 1000 sequences with Ising interactions. Over- 
lapping ground states signal a departure from REM. 

As we see in Fig. 1, P(Q) for all conformational spaces 
studied are peaked at small Q (this peak is not at Q = 
due to finite size effects). As Q increases, P(Q) decreases 
exponentially, and above a particular Qd, there are at 
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most 0(1) conformations available. Thus, for a space 
with small Qd, there are few states with large overlap 
and REM is favored. In Fig. 1, we see that crumpled con- 
formations have the greatest Qd as they allow a greater 
possibility of rearrangements on small scales (large Q). 

With the knowledge of the relative favorability of a 
given conformation space for REM validity, what is the 
effect of these statistical dependencies on the nature of 
freezing? While REM formally means that all states are 
statistically independent, what is relevant is the low en- 
ergy states. For example, degenerate ground states (as is 
common in models with "discrete" interactions will 
not overlap if they are statistically independent. This 
holds for 27-mers and 36-mers with Potts interactions 



{Bi. 



whose ground states yield a P(Q) which 



is indistinguishable from that of conformations taken at 
random jl(], 0. In the light of our previous discussion, 
REM appears to be valid because Qd is sufficiently small. 
The situation is different for crumpled 64-mers: upon 
enumerating Jl2| the energies of all conformations for 
1000 sequences with Ising (2 letter Potts) interactions 
and comparing the ground states (Fig. 2), we see that 
the increase in Qd for crumpled 64-mers is sufficient such 
that REM fails for this conformational space. This does 
not merely demonstrate that REM breaks for crumpled 
64-mers in particular, but rather REM validity is clearly 
not an a priori property of all conformation spaces in 
general, even in three dimensions p|. 



B. Interactions 

Interactions enter into the conformational dependent 
term of eq. (^) through the variance of the elements of 
the interaction matrix, and thus this term is present for 
all types of interactions. However, interactions play a 
more dramatic role in the conformational independent 
term in eq. @, as it vanishes for many models, but not, 
for example, if there is one monomer species that inter- 
acts particularly strong with all others; in this case, there 
is a correlated contribution even when there is no bond in 
common. The appearance of the conformation indepen- 
dent term signals a departure from REM, as even states 
with vanishing Q are statistically dependent. 

What forms of interactions have this residual (confor- 
mation independent) statistical dependence? One no- 
table example is the HP model (Bij = SnSij) ||. For 
even composition overall (p; = 1/q), but not fixed for a 
particular sequence, the conformation independent term 
does not vanish. However, for composition fixed to be 
even for each sequence and finite chains, there is an ad- 
ditional term to eq. (j^) which leads to a negative confor- 
mation independent constant. Thus, typical conforma- 
tions will have a small overlap, but the effect of this over- 
lap is canceled by the negative constant and the result 



is an effective statistical independence for typical con- 
formations (but anti-dependence (dependence) for con- 
formations with smaller (larger) overlap). Thus, REM 
may appear to be a good approximation, but only due 
to a complicated cancellation of factors. Upon exami- 
nation with eq. (0), the Miyazawa and Jernigan matrix 



(MJM) 1 15 1, a set of amino acid interaction potentials 
derived from protein statistics, seems like the HP model 
plus some noise; thus our analysis for HP is applicable to 
MJM. For models with an even composition and a sym- 
metric contribution from monomer species, such as Potts 
or Independent Interaction Model (IIM), in which By- 
are taken from a Gaussian distribution |s| interactions, 
the conformation independent term in eq. (^) vanishes 
and the only statistical dependence comes from confor- 
mational overlap. 



C. Sequences 

An uneven composition also leads to the non- vanishing 
of the conformation independent term (and thus a a sta- 
tistical dependence) in many models, including Potts and 
IIM, since there is an imbalance as in the HP model dis- 
cussed above. 

However, while we have been speaking about random 
sequences, protein sequences are not random. Indeed, 
theoretically, models of "minimal frustration" Q, "se- 
quence selection" fij], or "imprinting" j|] in which se- 
quences are chosen to have low energies in a desired target 
conformation *, have been employed as a better model 
of proteins. If states were truly statistically independent, 
these procedures would act to "pull down" the energy of 
the ground state alone; however, statistical dependence 
between states means that the energy of other states are 
pulled down as well. Moreover, sequence selection has 
also been experimentally realized in de novo protein de- 
sign ||, and REM is an implicit assumption in such ex- 
periments: if design leads to many states (not just the 
desired state *) with low energy, this procedure fails. 

For our purpose of describing the validity of REM, 
sequence selection is also a useful tool to demonstrate 
statistical dependence. In this sense, selection acts as 
a "field" in which we test the response of the energies 
of other conformations to the manipulation of the en- 
ergy of In Fig. 3, we see from the density of states, 
P{E,Q) = E a HQ-Q a *)5(E-H a ), of a well de- 
signed sequence with Ising interactions that sequence se- 
lection acts on all states a to a degree that is roughly 
linear in Q air . Therefore, the energy of a particular con- 
formation a is largely determined by Q a *. Thus, we see 
that sequence selection does not pull down the energy of 
a desired conformation but rather affects the whole den- 
sity of states. This is a dramatic deviation from REM. 

However, this effect is not necessarily detrimental. 
For models with a degree of statistical anti-dependence 
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(eg. fixed, even composition HP), selection may act to 
push up the energy levels of conformations other than *. 
Moreover, as this statistical anti-dependence varies be- 
tween sequences, it could be considered as a sequence 
selection criteria (in additional to optimization of the 
ground state) and therefore is a possible means to "design 
out" unwanted conformations |ll|. Also, recent models 
of "protein folding funnels" p] require conformational 
space deviations from REM: without some relationship 
between Q and energy, there would be no "path" to 
the ground state and kinetics would be simply a ran- 
dom search through conformation space, requiring the 
Levinthal time [M. 




FIG. 3. P(E — -Egnd, Q) for maximally compact 27-mer 
conformations (<2 max = 28), Ising interactions (Bij = Sij), 
and a sequence chosen to have a low ground state energy. 



IV. CONCLUSIONS 

While our discussion employs simplified models, such 
as lattice conformation spaces and a Hamiltonian (|l|) 
which includes only pairwise interactions, the aspects 
which push the system away from REM validity are 
model independent. Is heteropolymer freezing well de- 
scribed by REM? This complicated question is clearly 
dependent on all aspects of the system under question. 
What is clear is that REM validity is not guaranteed for 
all hctcropolymers. For the models commonly employed, 
it is notable that HP and MJM have extremely question- 
able REM validity. 

Is protein folding well described by REM? From the 
point of view of eq. (Q) , interactions between amino acids 
are in the HP/MJM class; biologically, the possibility of 
statistical anti-dependence is quite intriguing, and per- 
haps sheds light on the evolutionary significance of the 



nature of amino acid interactions. As for the nature 
of the space of protein conformations, one must con- 
sider possible kinetic restrictions as well as the role of 
secondary structure, which highly restricts the nature 
of conformational space, perhaps hindering REM. Thus, 
with possible deviations from REM arising from both in- 
teractions and conformational space, REM validity for 
proteins is questionable. 
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